Fault Tolerant File Models for MPI-IO Parallel File Systems

نویسندگان

  • Alejandro Calderón
  • Félix García Carballeira
  • Florin Isaila
  • Rainer Keller
  • Alexander Schulz
چکیده

Abstract. Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can make the whole system fail. In order to avoid this problem, data must be stored using some kind of redundant technique, so that it can be recovered in case of failure. Fault tolerance can be provided in I/O systems by using replication or RAID based schemes. However, most of the current systems apply the same technique of fault tolerant at disk or file system level. This paper describes how fault tolerance support can be used by MPI applications based on PVFS version 2 [1], a well-know parallel file system for clusters. This support can be applied to other parallel file systems with many benefits: fault tolerance at file level, flexible definition of new fault tolerance scheme, and dynamic reconfiguration of the fault tolerance policy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reliable MPI-IO through Layout-Aware Replication

The current deployment of petascale systems and the promise of future exascale systems have created unprecedented challenges in how to manage failures in such systems. While many parallel file systems provide some sort of redundancy mechanism to cope with failures, such systems rely heavily on a hardware-based solution such as RAID. In this paper, we propose a block replication approach to stor...

متن کامل

The Impact of File Systems on MPI-IO Scalability

As the number of nodes in cluster systems continues to grow, leveraging scalable algorithms in all aspects of such systems becomes key to maintaining performance. While scalable algorithms have been applied successfully in some areas of parallel I/O, many operations are still performed in an uncoordinated manner. In this work we consider, in three file system scenarios, the possibilities for ap...

متن کامل

A Windows-Based Parallel File System

Parallel file systems are widely used in clusters to provide high performance I/O. However, most of the existing parallel file systems are based on UNIX-like operating systems. We use the Microsoft .NET framework to implement a parallel file system for Windows. We also implement a file system driver to support existing applications written with Win32 APIs. In addition, a preliminary MPI-IO libr...

متن کامل

Performance Evaluation of Collective Write Algorithms in MPI I/O

MPI is the de-facto standard for message passing in parallel scientific applications. MPI-IO is a part of the MPI-2 specification defining file I/O operations in the MPI world. MPI-IO enables performance optimizations for collective file I/O operations as it acts as a portability layer between the application and the file system. The goal of this study is to optimize collective file I/O operati...

متن کامل

Efficient Structured Data Access in Parallel File Systems

Parallel scientific applications store and retrieve very large, structured datasets. Directly supporting these structured accesses is an important step in providing high-performance I/O solutions for these applications. High-level interfaces such as HDF5 and Parallel netCDF provide convenient APIs for accessing structured datasets, and the MPI-IO interface also supports efficient access to stru...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007